Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

نویسندگان

Zhe Wang

Xiaoyi Liu

Liangjian Chen

Limin Wang

Yu Qiao

Xiaohui Xie

Charless Fowlkes

چکیده

Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-ofspeech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs 1. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image-Question-Linguistic Co-Attention for Visual Question Answering

Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...

متن کامل

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image-related question, VQA returns a natural language answer. Since different questions inquire about the attributes of different image regions, generating correct answers requires the model to have questionguided attention, i.e., the attention on the regions correspond...

متن کامل

Segmentation Guided Attention Networks for Visual Question Answering

In this paper we propose to solve the problem of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttendNet. We use image segmentation maps, generated by a Fully Convolutional Deep Neural Network to refine our attention maps and use these refined attention maps to make the model focus on the relevant parts of the image to answer a question....

متن کامل

Automatic Multi-Layer Corpus Annotation for Evaluation Question Answering Methods: CBC4Kids

Reading comprehension tests are receiving increased attention within the NLP community as a controlled test-bed for developing, evaluating and comparing robust question answering (NLQA) methods. To support this, we have enriched the MITRE CBC4Kids corpus with multiple XML annotation layers recording the output of various tokenizers, lemmatizers, a stemmer, a semantic tagger, POS taggers and syn...

متن کامل

Automatic Multi-Layer Corpus Annotation for Evaluating Question Answering Methods: CBC4Kids

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.07853 شماره

صفحات -

تاریخ انتشار 2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

نویسندگان

چکیده

منابع مشابه

Image-Question-Linguistic Co-Attention for Visual Question Answering

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Segmentation Guided Attention Networks for Visual Question Answering

Automatic Multi-Layer Corpus Annotation for Evaluation Question Answering Methods: CBC4Kids

Automatic Multi-Layer Corpus Annotation for Evaluating Question Answering Methods: CBC4Kids

عنوان ژورنال:

اشتراک گذاری